rank-one matrix
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
Efficiently approximating local curvature information of the loss function is a useful tool for the optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, limiting their practicality. In this work, we investigate matrix-free approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. The first algorithm we propose is tailored towards network compression and can compute the IHVP for dimension $d$ given a fixed set of $m$ rank-one matrices using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP and query cost $O(m)$ for computing any single element of the inverse Hessian approximation. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction. We give an algorithm with cost $O(dm + m^2)$ for computing the IHVP and $O(dm + m^3)$ for adding or removing any gradient from the sliding window. We show that both algorithms yield competitive results for network pruning and optimization, respectively, with significantly lower computational overhead relative to existing second-order methods.
- North America > United States > Michigan (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
Efficiently approximating local curvature information of the loss function is a useful tool for the optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, limiting their practicality. In this work, we investigate matrix-free approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. The first algorithm we propose is tailored towards network compression and can compute the IHVP for dimension d given a fixed set of m rank-one matrices using O(dm 2) precomputation, O(dm) cost for computing the IHVP and query cost O(m) for computing any single element of the inverse Hessian approximation. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction.
- North America > United States > Michigan (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Nonconvex One-bit Single-label Multi-label Learning
Qiu, Shuang, Luo, Tingjin, Ye, Jieping, Lin, Ming
An important topic in the multi-label learning research is how to exploit the relationship between different classes of labels in order to improve the learning accuracy or reduce the number of required labels. When labels are partially observed, the low-rank matrix model is one of the most popular models to deal with missing labels. As human-labeling is usually expensive and time-consuming, it is critical to design a robust algorithm which is able to learn the underlying low-rank matrix model on datasets with noisy heavily missing labels. In this work, we consider an extreme scenario where each training instance only has one single label being annotated in binary set 1 out of multiple classes of labels. This scenario is often encountered in realworld systems but less discussed in literatures. For example, it is rare for a user to annotate a news article or a piece of music with many tags, especially when the user is not paid for his annotation. The problem becomes challenging when we have a large number of features and classes. Over the past decades, a number of multi-label learning approaches have been proposed under different settings.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing
We develop an efficient alternating framework for learning a generalized version of Factorization Machine (gFM) on steaming data with provable guarantees. When the instances are sampled from $d$ dimensional random Gaussian vectors and the target second order coefficient matrix in gFM is of rank $k$, our algorithm converges linearly, achieves $O(\epsilon)$ recovery error after retrieving $O(k^{3}d\log(1/\epsilon))$ training instances, consumes $O(kd)$ memory in one-pass of dataset and only requires matrix-vector product operations in each iteration. The key ingredient of our framework is a construction of an estimation sequence endowed with a so-called Conditionally Independent RIP condition (CI-RIP). As special cases of gFM, our framework can be applied to symmetric or asymmetric rank-one matrix sensing problems, such as inductive matrix completion and phase retrieval.
- North America > United States > Michigan (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing
We develop an efficient alternating framework for learning a generalized version of Factorization Machine (gFM) on steaming data with provable guarantees. When the instances are sampled from $d$ dimensional random Gaussian vectors and the target second order coefficient matrix in gFM is of rank $k$, our algorithm converges linearly, achieves $O(\epsilon)$ recovery error after retrieving $O(k^{3}d\log(1/\epsilon))$ training instances, consumes $O(kd)$ memory in one-pass of dataset and only requires matrix-vector product operations in each iteration. The key ingredient of our framework is a construction of an estimation sequence endowed with a so-called Conditionally Independent RIP condition (CI-RIP). As special cases of gFM, our framework can be applied to symmetric or asymmetric rank-one matrix sensing problems, such as inductive matrix completion and phase retrieval.